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Abstract 

We derive a majorization algorithm for the multidimensional scaling loss function 
qStress, with q small. 
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Note: This is a working paper which will be expanded/upclated frequently. All suggestions 
for improvement are welcome. The directory deleeuwpdx.net/pubfolders/qstress has a pdf 
version, the complete Rmd hie with all code chunks, the bib hie, and the R source code. 


1 Introduction 

We define qStress as 


cr q {x) := Wi(Si - ( x'Aix ) q ) 2 , ( 1 ) 

i= 1 

where the 5, are dissimilarities between pairs of objects and the djx) = x'AiX are squared 
Euclidean distances between pairs of points in M p . 
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The properties of qStress, and algorithms to minimize it for various values of q, have been 
discussed in Groenen and De Leeuw (2010), De Leeuw (2014), De Leeuw, Groenen, and Mair 
(2016b), De Leeuw, Groenen, and Mair (2016a), De Leeuw, Groenen, and Mair (2016c). In 
this paper we develop a new majorization algorithm for 0 < q < |, based on an elementary 
inequality for power funcions. 

Without loss of generality we assume 

n 

EM 2 = !• 

i —1 

We also expand the sum of squares to decompose qStress into two additive components that 
can be analyzed, and majorized, separately. The notation is the same as that used by De 
Leeuw (1977). 


a q (x) = 1 - 2 p q (x) + rj q (x), (2) 

where 


n 


Vq(x) 

:=^2wi(x'Aix) 2q , 
i= 1 

(3) 

Pq(x) 

n 

:= ^w^x' Aix) q . 

i= 1 

(4) 


2 Preliminaries 

The basis of our treatment is a family of simple inequalities for powers of non-negative 
numbers. 

Lemma 1: [Power] 

1. For z > 0 and r < 0 we have z r > rz + (1 — r). 

2. For z > 0 and 0 < r < 1 we have z r < rz + (1 — r). 

3. For z > 0 and r > 1 we have z r > rz + (1 — r). 

In all three cases we have equality if and only if z — 1. 

Proof: The derivative of f(z) = z r — rz vanishes at z — 1, and /(1) = 1 — r. Because 
/"(1) = r(r — 1) we see that / has a minimum at 1 if r < 0 or r > 1 and it has a maximum 
at 1 if 0 < r < 1. QED 

Equivalently, the inequalities follow from the convexity of z r for r > 1 or r < 0 and the 
concavity of z r for 0 < r < 1. 

We can apply lemma 1 to powers of ratios of quadratic forms. 

Lemma 2: [Forms] Suppose A is positive definite, x and y are arbitrary vectors, s and t 
are arbitrary real numbers. Then 
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(x'Ax) sr+t < r(x'Ax) s+t {y'Ay)< r ~V + (1 - r){x'Ax)\y'Ay) sr 


( 5 ) 


for 0 < r < 1 and 

(: x'Ax) sr+t > r{x'Ax) s+t {y'Ay) s(r ~^ + (1 - r){x'Ax) t (y'Ay) sr . (6) 

for r > 1 and r < 0. There is equality if and only if x'Ax = y'Ay. 

Proof: From lemma 1 

x'Ax\ s 
y' Ay ) 

with > or < depending on r. Thus 

( x'Ax) rs ^ r(x'Ax) s (y'Ay)< r -^ + (1 - r){y'Ay) rs , 
and multiplying both sides by {x'Ax) 1 gives the result. QED 

3 qStress 

In order to majorize a q we first minorize p q and then majorize r) q . 

Lemma 3: [Rho] If A is positive definite, x and y are arbitrary, and q < |, then 

(. x'Ax) q > (2 q — 1 ){x'Ax)(y'Ay) q ~ 1 + 2(1 — q){x'Ax)^(y'Ay) q ~^, 
with equality if and only if x'Ax = y'Ay. 

Proof: Take s — t — | and r = 2q — 1 in equation (6) in lemma 2. QED 
Lemma 4: [CS] If A is positive definite, x and y are arbitrary, and q < |, then 

{x'Ax) q > (2 q — l){x'Ax){y , Ay) q ~ 1 + 2(1 — q){x'Ay){y'Ay ) q ~ l , 

with equality if and only if x'Ax = y 1 Ay. 

Proof: By Cauchy-Schwartz {x'Ax)^ > {y'Ay)~^{x'Ay). Substitute this in lemma 3. QED 
Lemma 5: [Eta] If A is positive definite, x and y are arbitrary, and 0 < q < |, 

{x'Ax) 2q < 2 q(x'Ax)(y , Ay) 2q - 1 + (1 — 2q)(y'Ay) 2q , 

with equality if and only if x'Ax = y'Ay. 

Proof: Take s — 1 and t = 0 and r = 2 q in equation (5) in lemma 2. QED 
For the next theorem, our main majorization result, we define 
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( 7 ) 


s(y) := J2 w i(y'A l y) 2q 1 A U 

i= 1 
n 

T (y ) := '52w i 6 i (y'A i y) q - 1 A i . (8) 

i=i 

Theorem 1: [Majorize] For 0 < q < \ 

a q (x) < 1 + 7 {y) + x’V(y)x - 2 x'B(y)y, 

where 

V%) := 2{q , *S'(i/) — (2q — l)T(y)}, 

B(y) := 2(1 -q)T(y), 

and 

n 

7(y) := (1 - 2g) (H) 

2=1 

We have equality if and only if x'Ax = y'Ay. 

Proof: Substitute the inequalities in lemma 3 and lemma 5 in the defintion of a q and collect 
terms. QED 

Note that because q < | we have q(y'Aiy) 2 ^ 1 — (2 q — 1 )5i(y'A i y) q ~ 1 > 0, and thus V(y) is 
positive semi-definite for all y. The majorization function is a convex quadratic. 

4 Algorithm 

From theorem 1 the majorization algorithm, using the Moore-Penrose inverse, is 

x {k) = V(x {k) ) + B(x w )x {k \ (12) 

followed by the acceleration step (De Leeuw and Heiser 1980) 

x (fc+i) _ ax ( k ) _|_ (i _ a)x^ k \ (13) 

By default the acceleration parameter a is -1. 

Note that if q — | then 
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(9) 

( 10 ) 



n 

V(y) = S(y ) =J2 w i A i, 

i= 1 

B w= T (v ) = P h 7m A " 

which means that (12) and (13) define the standard smacof algorithm. Also note that if q ~ (1 
then V(y) ~ B(y), suggesting slow convergence. 

5 Example 

We use the color data from Ekrnan (1954), analyzing it in two dimensions with q = 
0.50,0.33,0.25,0.10. 

The minimum loss function values, and the number of iterations needed to compute them, are 

## 0.50 0.032566 12 

## 0.33 0.002572 47 

## 0.25 0.001910 81 

## 0.10 0.011123 670 

In all cases convergence was monotone and smooth, although for small values of q it is slow. 
The four configuration plots are 
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Figure 1: Ekman Configurations 

The plots of dissimilarities versus recovered powered squared distances, i.e. of 5i versus 
(. x'Aix ) q , are 
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Figure 2: Ekman Fitplots 

As a final application we approximate the minimization of the MULTISCALE loss function 
(Ramsay 1977) 

n 

&o(x) = 5>(log* - log (x'Aix)) 2 
2=1 


by using the approximation 


logo: ~ 



1 


This means minimizing, for a small q, 


&o(x) ps q 2 J2wi($i ~ ( x'Aix) q f. 
2=1 


Thus we can find an approximate solution by minimizing qStress, using Sf as data. Note that 
for small q all 5% will be close to one. 

For q — . 1 we need 1922 iterations to find an approximate minimum of Co equal to 0.3079881. 
The configuration is 
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Figure 3: Ekman MULTISCALE Configuration 

Note that the two concentric circles in the configuration are suggestive of the solution from 
multidimensional scaling when all dissimilarties are the same (De Leeuw and Stoop 1984). 



MULTISCALE 



Figure 4: Ekman MULTISCALE Fitplot 

Not surprisingly the algorithm concentrates on fitting the smaller dissimilarities. It is unclear, 
however, what the precise effect is of having all dissimilarities and transformed distances close 
to one. In our rStressMinO function the initial configuration is always found by classical 
scaling. This may not be a very good initial estimate if q is much smaller than |. Further 
research is needed into the frequency of local minima problems with q very small. 

6 Appendix: Code 

library (MASS) 

torgerson <- function(delta, p = 2) { 
doubleCenter <- function(x) { 
n <- dim(x) [1] 
m <- dim(x) [2] 
s <- sum(x) / (n * m) 
xr <- rowSums(x) / m 
xc <- colSums(x) / n 
return((x - outer(xr, xc, "+")) + s) 

} 

z <- 

eigen(-doubleCenter((as.matrix (delta) 2) / 2) , symmetric = TRUE) 
v <- pmax (z$values, 0) 
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return(z$vectors [, 1: p] 7 0 * 0 / 0 diag(sqrt (v[1:p]))) 

} 

enorm <- function (x, w = 1) { 

return (sqrt (sum (w * (x ~ 2)))) 

> 

sqdist <- function (x) { 
s <- tcrossprod (x) 
v <- diag (s) 

return (outer (v, v, "+") - 2 * s) 

} 

mkBmat <- function (x) { 
d <- rowSums (x) 
x <- -x 
diag (x) <- d 
return (x) 

} 

mkPower <- function (x, r) { 
n <- nrow (x) 

return (abs ((x + diag (n)) r) - diag(n)) 

} 

rStressMin <- 

function (delta, 

w = 1 - diag (nrow (delta)), 
xold = NULL, 

p = 2, 

r = 0.5, 
alpha = -1, 
eps = le-10, 
itmax = 100000, 
verbose = TRUE) { 
delta <- delta / enorm (delta, w) 
itel <- 1 

if (is.null (xold)) 

xold <- torgerson (delta, p = p) 
n <- nrow (xold) 
dold <- sqdist (xold) 

rold <- sum (w * delta * mkPower (dold, r)) 
nold <- sum (w * mkPower (dold, 2 * r)) 
sold <- 1 - 2 * rold + nold 
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repeat { 

pi <- mkPower (dold, r - 1) 

p2 <- mkPower (dold, (2 * r) - 1) 

gy <- (1 - (2 * r)) * sum (w * (dold ~ (2 * r))) 

by <- 2 * (1 - r) * mkBmat (w * delta * pi) 

vy <- 2 * mkBmat (w * ((r * p2) + (1 - (2 * r)) * delta * pi)) 

xnew <- ginv (vy) °/ 0 *°/o by %*% xold 

xnew <- alpha * xold + (1 - alpha) * xnew 

dnew <- sqdist (xnew) 

rnew <- sum (w * delta * mkPower (dnew, r)) 

nnew <- sum (w * mkPower (dnew, 2 * r)) 

snew <- 1 - 2 * rnew + nnew 
if (verbose) { 
cat ( 

formate (itel, width = 4, format = "d"), 
formate ( 
sold, 

digits = 10, 
width = 13, 
format = "f" 

), 

formate ( 
snew, 

digits = 10, 
width = 13, 
format = "f" 

), 

"\n" 

) 

} 

if ((itel == itmax) II ((sold - snew) < eps)) 
break () 


itel 

<- 

itel + 1 

xold 

<- 

xnew 

dold 

<- 

dnew 

sold 

<- 

snew 

> 



return 

(list (x = xnew. 


d = dnew, 
delta = delta, 
sigma = snew, 
itel = itel)) 
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7 NEWS 

001 03/11/16 

• First Upload 
002 03/11/16 

• Corrected bug in formula for V(y) 

003 03/12/16 

• Small corrections and additions 
004 03/12/16 

• Possibility to choose initial configuration 

• acceleration factor 

005 03/14/16 

• rStressMin also returns d and delta 
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